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Abstract 

In this paper, we try to further demonstrate that 
the models of random CSP instances proposed by 
| |Xu and Li, 2000||Xu and LIT2 003 1 are of theoret- 
ical and practical interest. Indeed, these models, 
called RB and RD, present several nice features. 
First, it is quite easy to generate random instances 
of any arity since no particular structure has to be 
integrated, or property enforced, in such instances. 
Then, the existence of an asymptotic phase tran- 
sition can be guaranteed while applying a limited 
restriction on domain size and on constraint tight- 
ness. In that case, a threshold point can be precisely 
located and all instances have the guarantee to be 
hard at the threshold, i.e., to have an exponential 
tree-resolution complexity. Next, a formal analy- 
sis shows that it is possible to generate forced sat- 
isfiable instances whose hardness is similar to un- 
forced satisfiable ones. This analysis is supported 
by some representative results taken from an inten- 
sive experimentation that we have carried out, using 
complete and incomplete search methods. 

1 Introduction 

Over the past ten years, the study of phase transition phenom- 
ena has been one of the most exciting areas in Computer Sci- 
ence and Artificial Intelligence. Numerous studies have es- 
tablished that for many NP-complete problems (e.g., SAT and 
CSP), the hardest random instances occur, while a control pa- 
rameter is varied accordingly, between an under-constrained 
region where all instances are almost surely satisfiable and an 
over-constrained region where all instances are almost surely 
unsatisfiable. In the transition region, there is a threshold 
where half the instances are satisfiable. Generating hard in- 
stances is important both for understanding the complexity 
of the problems and for providing challenging benchmarks 
ICook and Mitchell, 1997| . 

Another remarkable progress in Artificial Intelligence has 
been the development of incomplete algorithms for various 
kinds of problems. And, since this progress, one important 
issue has been to produce hard satisfiable instances in or- 
der to evaluate the efficiency of such algorithms, as the ap- 
proach that involves exploiting a complete algorithm in order 
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to keep random satisfiable instances generated at the thresh- 
old can only be used for instances of limited size. Also, it has 
been shown that generating hard (forced) satisfiable instances 
is related to some open proble ms in cryptography such 
as computing a one-way function llmpagli azzo et ah, 1989| 
[Cook and Mit chell, 19971 • 

In this paper, we mainly focus on random CSP (Con- 
straint Satisfaction Problem) instances. Initially, four "stan- 
dard" models, den oted A, B, C and D (Smith and Dyer , 1996) 
|Gent et ah, 200 1| , have been introduced to generate random 
binary CSP instances. However, |Achlioptas et ah, 1997| 
have identified a shortcoming of all these models. Indeed, 
they prove that random instances generated using these mod- 
els suffer from (trivial) unsatisfiability as the number of vari- 
ables increases. To overcome the deficiency of these standard 
models, several alternatives have been proposed. 

On the one hand, I Achlioptas et ah, 1997 1 have proposed a 
model E and I Mollo y72003| a generalized model. However, 
model E does not permit to tune the density of the instances 
and the generalized model requires an awkward exploitation 
of probability distributions. Also, other alternatives corre- 
spond to incorporating some "structure" in the generated ran- 
dom instances. Roughly speaking, it invol ves ensuring that 
the generated instan ces be arc consistent |Gent et ah, 200 1| 
or path consistent I Gao and Culberson, 2004 1 . The main 
drawback of all these approaches is that generating random 
instances is no more quite a natural and easy task. 

On the other hand, jXu and Li, 20001 IXu and Li, 2 003 1, 
|Frieze and Molloy72 003 1 and I Smith, 2001 j have revisited 
standard models by controlling the way parameters change 
as the pro blem size in creases. The alternative model D 
scheme of I Smith, 2001 1 guarantees the occurrence of a phase 
transition when some parameters are controlled and when 
the constraint tightness is within a certain range. The two 
revised models, called RB and RD, of |Xu and Li, 2000| 
|Xu and Li7 2003 1 provide the same guarantee by varying one 
of two control parameters aro und a critical value that, in addi- 
tion, can be computed. Also, | Frieze and Molloy, 2003 1 iden- 
tify a range of suitable parameter settings in order to exhibit a 
non-trivial threshold of satisfiability. Their theoretical results 
apply to binary instances taken from model A and to "sym- 
metric" binary instances from a so-called model B which, not 
corresponding to the standard one, associates the same rela- 
tion with every constraint. 



The models RB and RD present several nice features: 

• it is quite easy to generate random instances of any arity 
as no particular structure has to be integrated, or prop- 
erty enforced, in such instances. 

• the existence of an asymptotic phase transition can be 
guaranteed while applying a limited restriction on do- 
main size and on constraint tightness. For instances in- 
volving constraints of arity k, the domain size is required 
to be greater than the k th root of the number of variables 
and the (threshold value of the) constraint tightness is 
required to be at most sjfi. 

• when the asymptotic phase transition exists, a threshold 
point can be precisely located, and all instances gener- 
ated following models RB and RD have the guarantee 
to be hard at the threshold, i.e., to have an exponential 
tree-resolution complexity. 

• it is possible to generate forced satisfiable instances 
whose hardness is similar to unforced satisfiable ones. 

This paper is organized as follows. After introducing mod- 
els RB and RD, as well as some theoretical results (Section 
|2ji, we provide a formal analysis about generating both forced 
and unforced hard satisfiable instances (Section^. Then, we 
present the results of a large series of experiments that we 
have conducted (Section |4j, and, before concluding, we dis- 
cuss some related work (Section^. 

2 Theoretical background 

A constraint network consists of a finite set of variables such 
that each variable X has an associated domain denoting the 
set of values allowed for X, and a a finite set of constraints 
such that each constraint C has an associated relation denot- 
ing the set of tuples allowed for the variables involved in C. 

A solution is an assignment of values to all the variables 
such that all the constraints are satisfied. A constraint network 
is said to be satisfiable (sat, for short) if it admits at least a 
solution. The Constraint Satisfaction Problem (CSP), whose 
task is to determine if a given constraint network, also called 
CSP instance, is satisfiable, is NP-complete. 

In this section, we introduce some theoretical results taken 
from I Xu and Li, 2000 1 Xu and Li, 2003 1 . First, we introduce 
a model, denoted RB, that represents an alternative to model 
B. Note that, unlike model B, model RB allows selecting con- 
straints with repetition. But the main difference of model RB 
with respect to model B is that the domain size of each vari- 
able grows polynomially with the number of variables. 

Definition 1 (Model RB) A class of random CSP instances 
of model RB is denoted RB(k,n,a,r,p) where: 

• k > 2 denotes the arity of each constraint, 

• n denotes the number of variables, 

• a > determines the domain size d — n a of each vari- 
able, 

• r > determines the number m — r.n.lnn of con- 
straints, 

• 1 > p > denotes the tightness of each constraint. 



To build one instance P G RB(k,n,a,r,p), we select with rep- 
etition m constraints, each one formed by selecting k distinct 
variables and p.d k distinct unallowed tuples (as p denotes a 
proportion). 

When fixed, a and r give an indication about the growth 
of the domain sizes and of the number of constraints as n 
increases since d — n a and m = rn In n, respectively. It is 
then possible, for example, to determine the critical value p cr 
of p where the hardest instances must occur. Indeed, we have 
p cr = 1 — e~ a l r which is equivalent to the expression of p cr 
given by I Smit h and Dyer, 1996| . 

Another model, denoted model RD, is similar to model RB 
except that p denotes a probability instead of a proportion. 
For convenience, in this paper, we will exclusively refer to 
model RB although all given results hold for both models. 

In I Xu and Li, 2000| , it is proved that model RB, under cer- 
tain conditions, not only avoids trivial asymptotic behaviors 
but also guarantees exact phase transitions. More precisely, 
with Pr denoting a probability distribution, the following the- 
orems hold. 

Theorem 1 Ifk, a > j- and p < are constants then 



lim Pr[P £ RB(k,n,a,r,p) is sat] = 



where r c 



1 if r <r c 
if r > r c 



ln(l-p)- 

Theorem 2 If k, a > i and p cr < are constants then 



lim Pr[P € RB(k,n,a,r,p) is sat 



1 if p <p c 
if p>p c 



where p cr = 1 — e r , 

Remark that the condition p cr < is equivalent to 

ke~^ > 1 given in IXu and Li, 2000| . TheoremsQandl^lin- 
dicate that a phase transition is guaranteed provided that the 
domain size is not too small and the constraint tightness or the 
threshold value of the constraint tightness not too large. As 
an illustration, for instances involving binary (resp. ternary) 
constraints, the domain size is required to be greater than the 
square (resp. cubic) root of the number of variables and the 
constraint tightness or threshold value of the tightness is re- 
quired to be at most 50% (resp. w 66%). 

The following theorem establishes that unsatisfiable in- 
stances of model RB almost surely have the guarantee to be 
hard. A similar result for model A has been obtained by 
|Frieze and Molloy, 2003| with respect to binary instances. 

Theorem 3 If P e RB(k,n,a,r,p) and k, a, r and p are con- 
stants, then, almost surely 1 , P has no tree-like resolution of 
length less than 2 n( - n \ 

The proof, which is based on a strategy following some re- 
sults of |Ben-Sasson and Wigderson, 20011 |Mitchell, 2002| , 
is omitted but can be found in I Xu and Li, 2003 1. 



We say that a property holds almost surely when this property 
holds with probability tending to 1 as the number of variables tends 
to infinity. 



To summarize, model RB guarantees exact phase transi- 
tions and hard instances at the threshold. It then contra- 
dicts the statement of |Gao and Culberson, 20041 about the 
requirement of an extremely low tightness for all existing ran- 
dom models in order to have non-trivial threshold behaviors 
and guaranteed hard instances at the threshold. 

3 Generating hard satisfiable instances 

For CSP and SAT, there is a natural strategy to generate forced 
satisfiable instances, i.e., instances on which a solution is im- 
posed. It suffices to proceed as follows: first generate a ran- 
dom (total) assignment t and then generate a random instance 
with n variables and m constraints (clauses for SAT) such 
that any constraint violating t is rejected, t is then a forced 
solution. This strategy, quite simple and easy to implement, 
allows generating hard forced satisfiable instances of model 
RB provided that Theorem 1 or 2 holds. Nevertheless, this 
statement deserves a theoretical analysis. 

Assuming that d denotes the domain size (d = 2 for SAT), 
we have exactly d n possible (total) assignments, denoted by 
ii, ta> • " ,td n , and d 2n possible assignment pairs where an 
assignment pair < ti , tj > is an ordered pair of two assign- 
ments ti and tj. We say that < ti, tj > satisfies an instance if 
and only if both ti and tj satisfy the instance. Then, the ex- 
pected (mean) number of solutions Ef[N] for instances that 
are forced to satisfy an assignment ti is: 



E f [N] 



y^ Pr[< U,tj >] 
^Pr[< ti,U >} 



where Pr[< U,tj >} denotes the probability that < U,tj > 
satisfies a random instance. Note that Ef[N] should be inde- 
pendent of the choice of the forced solution fj. So we have: 



Pr[< 



E f [N] 



l<i,j<d n 



d"Pr[< U,U >] 



E[N 2 } 
~E\N] 



where E[N 2 } and E[N] are, respectively, the second moment 
and the first moment of the number of solutions for random 
unforced instances. 

For random 3-SAT, it is known that the strategy men- 
tioned above is unsuitable as it produces a biased sam- 
pling of instances with many solutions clustered around t 
I Achlioptas et ah, 20001. Experiments show that forced sat- 
isfiable instances are much easier to solve than unforced satis- 
fiable instances. In fact, it is not hard to show that, asymptot- 
ically, E[N 2 ] is exponentially greater than E 2 [N]. Thus, the 
expected number of solutions for forced satisfiable instances 
is exponentially larger than the one for unforced satisfiable in- 
stances. It then gives a good theoretical explanation of why, 
for random 3-SAT, the strategy is highly biased towards gen- 
erating instances with many solutions. 

For model RB, recall that when the exact phase tran- 
sitions were established |Xu and Li, 20 001, it was proved 
that E[N 2 ]/E 2 [N] is asymptotically equal to 1 below the 
threshold, where almost all instances are satisfiable, i.e. 
E[N 2 ]/E 2 [N] 1 for r < r cr or p < p cr . Hence, the 
expected number of solutions for forced satisfiable instances 



below the threshold is asymptotically equal to the one for un- 
forced satisfiable instances, i.e. E f [N] = E[N 2 ]/E[N] « 
E[N]. In other words, when using model RB, the strategy 
has almost no effect on the number of solutions and does not 
lead to a biased sampling of instances with many solutions. 

In addition to the analysis above, we can also study the 
influence of the strategy on the distribution of solutions with 
respect to the forced solution. We first define the distance 
df (ti, tj ) between two assignments ti and tj as the proportion 
of variables that have been assigned a different value in t j and 
tj. We have < d f (ti,tj) < 1. 

For forced satisfiable instances of model RB, with Ej [N] 
denoting the expected number of solutions whose distance 
from the forced solution (identified as ti, here) is equal to 6 , 
we obtain by an analysis similar to that in | Xu and Li, 2 000 1 : 



rid 



(n -1) 



/ n — nd 
\ k 



+ l- 



/ n — nS\ 
\ k ) 



exp \n In n (r In ^1 — p + p(l — S) k j + aS^ + 0(n 



It can be shown, from the results in [ Xu and Li, 20001 that 
Ej[N], for r < r cr ox p < p cr , is asymptotically maximized 
when 5 takes the largest possible value, i.e. 5 = 1. 

For unforced satisfiable instances of model RB, with 
E 5 [N] denoting the expected number of solutions whose dis- 
tance from ti (not necessarily a solution) is equal to 5, we 
have: 

E S \N] = ( U }\ (n a - l) nS (l-p) rnlnn 
\no J 

= exp [n In n (r ln(l — p) + otS) + 0(n)\ . 

It is straightforward to see that the same pattern holds for this 
case, i.e. E S [N] is asymptotically maximized when 5 = 1. 

Intuitively, with model RB, both unforced satisfiable in- 
stances and instances forced to satisfy an assignment t are 
such that most of their solutions distribute far from t. This in- 
dicates that, for model RB, the strategy has little effect on the 
distribution of solutions, and is not biased towards generating 
instances with many solutions around the forced one. 

For random 3-SAT, similarly, we can show that as r (the 
ratio of clauses to variables) approaches 4.25, EAN] and 

E [N] are asymptotically maximized when 6 « 0.24 and 
S = 0.5, respectively. This means, in contrast to model RB, 
that when r is near the threshold, most solutions of forced in- 
stances distribute in a place much closer to the forced solution 
than solutions of unforced satisfiable instances. 



4 Experimental results 

As all introduced theoretical results hold when n —> oo, the 
practical exploitation of these results is an issue that must be 
addressed. In this section, we give some representative ex- 
perimental results which indicate that practice meets theory 
even if the number n of variables is small. Note that different 
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Figure 1: Difference between theoretical and experimental 
thresholds against a, r and n 



values of parameters a and r have been selected in order to 
illustrate the broad spectrum of applicability of model RB. 

First, it is valuable to know in practice, to what extent, 
Theorems ^ and [2] give precise thresholds according to dif- 
ferent values of a, r and n. The experiments that we have 
run wrt Theorem |2] as depicted in Figure [2 suggest that all 
other parameters being fixed, the greater the value of a, r 
or n is, the more precise Theorem |2 is. More precisely, in 
Figure [2 the difference between the threshold theoretically 
located and the threshold experimentally determined is plot- 
ted against a G [0.2, 1] (d G [2. .20]), against r G [0.8,2.5] 
(to g [50.. 150]) and against n G [8.. 100]. Note that the ver- 
tical scale refers to the difference in constraint tightness and 
that the horizontal scale is normalized (value respectively 
corresponds to n — 8, a = 0.2 and r = 0.8, etc.). 

To solve the random instances generated by model RB, we 
have used a systematic backtracking search algorithm (MAC) 
and a local search algorithm (tabu search). Both algorithms 
have been equipped with a search heuristic that learns from 
conflicts [Boussem art et al, 2 0041. 

We have studied the difficulty of solving with MAC the bi- 
nary instances of model RB generated around the theoretical 
threshold p cr w 0.23 given by Theorem|2]f or k = 2,a= 0.8, 
r = 3 and n G {20, 30, 40}. In Figure|2] it clearly appears 
that the hardest instances are located quite close to the the- 
oretical threshold and that the difficulty grows exponentially 
with n (note the use of a log scale). It corresponds to a phase 
transition (not depicted here, due to lack of space). A similar 
behavior is observed in Figure [3] with respect to ternary in- 
stances generated around the theoretical threshold p cr ss 0.63 
for k = 3, a = 1, r = 1 and n G {16, 20, 24}. 

As the number and the distribution of solutions are the two 
most important factors determining the cost of solving sat- 
isfiable instances, we can expect, from the analysis given in 
Section [5] that for model RB, the hardness of solving forced 
satisfiable instances should be similar to that of solving un- 



forced satisfiable ones. This is what is observed in Figure|2] 

To confirm this, we have focused our attention to a point 
just below the threshold as we have then some (asymptotic) 
guarantee about the difficulty of bot h unforced and f orced in- 
stances (see Theorems 5 and 6 in l |Xu and Li, 2003| ) and the 
possibility of generating easily unforced satisfiable instances. 
Figure |4] shows the difficulty of solving with MAC both 
forced and unforced instances of model RB at p cr ~ 0.01 ~ 
0.40 for k = 2, a = 0.8, r = 1.5 and n G [20.. 50]. 

To confirm the inherent difficulty of the (forced and un- 
forced) instances generated at the threshold, we have also 
studied the runtime distribution produc ed by a randomized 
search algorithm on distinct instances jGomes et al, 2 0041. 
For each instance, we have performed 5000 independent runs. 
Figure|5]displays the survival function, which corresponds to 
the probability of a run taking more than x backtracks, of a 
randomized MAC algorithm for two representative instances 
generated at p cr w 0.41 for k — 2, a = 0.8, r — 1.5 and 
n G {40, 45}. One can observe that the runtime distribution 
(a log-log scale is used) do not correspond to an heavy-tailed 
one, i.e., a distribution characterized by an extremely long tail 
with some infinite moment. It means that all runs behave ho- 
mogeneously and, therefore, it suggests that the instances are 
inherently hard |Gomes et al, 2004 1 . 

Then, we have focused on unforced unsatisfiable instances 
of model RB as Theorem[3]indicates that such instances have 
an exponential resolution complexity. We have generated un- 
forced and forced instances with different constraint tightness 
p above the threshold p cr rj 0.41 for k = 2, a = 0.8, r = 1.5 
and n G [20.. 450]. Figure [5] displays the search effort of 
a MAC algorithm to solve such instances against the num- 
ber of variables n. It is interesting to note that the search 
effort grows exponentially with n, even if the exponent de- 
creases as the tightness increases. Also, although not cur- 
rently supported by any theoretical result (Theorems 5 and 6 
of I Xu and Li, 20 03 1 hold only for forced instances below the 
threshold) it appears here that forced and unforced instances 
have a similar hardness. 

Finally, Figure Q shows the results obtained with a tabu 
search with respect to the binary instances that have been pre- 
viously considered with MAC (see Figure^}- The search ef- 
fort is given by a median cost since when using an incomplete 
method, there is absolutely no guarantee of finding a solution 
in a given limit of time. Remark that all unsatisfiable (un- 
forced) instances below the threshold have been filtered out 
in order to make a fair comparison. It appears that both com- 
plete and incomplete methods behave similarly. In Figure 
one can see that the search effort grows exponentially with n 
and that forced instances are as hard as unforced ones. 

5 Related work 

As a related work, we can mention the recent progress on gen- 
erating hard satisfiable SAT instances. iBa rthel et al, 2002| 
|Jia et a/. ,"20 04 1 have proposed to build random satisfiable 3- 
SAT instances on the basis of a spin-glass model from statisti- 
cal physics. Another approach, quite easy to implement, has 
also been proposed by I Achlioptas et al, 2004 1: any 3-SAT 
instance is forced to be satisfiable by forbidding the clauses 



violated by both an assignment and its complement. 

Finally, let us mention |Achlioptas et al, 200 1 which pro- 
pose to build random instances with a specific structure, 
namely, instances of the Quasigroup With Holes (QWH) 
problem. The hardest instances belong to a new type of phase 
transition, defined from the number of holes, and coincide 
with the size of the backbone. 

6 Conclusion 

In this paper, we have shown, both theoretically and practi- 
cally, that the models RB (and RD) can be used to produce, 
very easily, hard random instances. More importantly, the 
same result holds for instances that are forced to be satisfi- 
able. To perform our experimentation, we have used some of 
the most efficient complete and incomplete CSP solvers. We 
have also encoded some forced binary CSP instances of class 
RB(2,n,0.8,0.8/Zn|,p = p cr = 0.25) with n ranging from 40 
to 59 into SAT ones (using the direct encoding method) and 
submitted them to the SAT competition 2004 2 . About 50% of 
the competing solvers have succeeded in solving the SAT in- 
stances corresponding to n = 40 (d = 19 and m = 410) 
whereas only one solver has been successful for n = 50 
(d = 23 and m = 544). 

Although there are some other ways to generate hard sat- 
isfiable instanc es, e.g. Q WH I Achliop tas et al, 2 0001 or 2- 
hidden | Achlio ptas et al., 2004| instances, we think that the 
simple and natural method presented in this paper, based on 
models with exact phase transitions and many hard instances, 
should be well worth further investigation. 
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Figure 2: Mean search cost (50 instances) of solving in- 
stances in RB(2,{20, 30, 40},0.8,3,p) 



Figure 4: Mean search cost (50 instances) of solving 
stances in RB(2,[20..50],0.8,1.5,p cr - 0.01) 




Figure 3: Mean search cost (50 instances) of solving in- Figure 5: Non heavy-tailed regime for instances 
stances in RB(3,{16, 20, 24},l,l,p) " RB(2,{40, 45},0.8,1.5,p cr « 0.41) 
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Figure 6: Mean search cost (50 instances) of solving in- 
stances in RB(2,[20..450],0.8,l-5,p) 
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Figure 7: Median search cost (50 instances) of solving in- 
stances in RB(2,{20, 30, 40},0.8,3,p) using a tabu search 



